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Abstract 

We consider a class of discrete time stochastic control problems motivated by some 
t— I financial applications. We use a pathwise stochastic control approach to provide a 

dual formulation of the problem. This enables us to develop a numerical technique 
for obtaining an estimate of the value function which improves on purely regression 
based methods. We demonstrate the competitiveness of the method on the example 
^ of a gas storage valuation problem. 

Q 

C\ 1 Introduction 

The numerical pricing of options with early exercise features, such as American options, 
is a challenging problem, especially when the dimension of the underlying asset increases. 

Ph There is a large body of literature which discusses this problem from different points 

of view, beginning with techniques aimed at solving the dynamic programming problem 
using trees or the associated Hamilton-Jacobi-Bellman equation. Over the past decade, 
there has been a lot of activity in developing Monte Carlo techniques for such optimal 
stopping problems. The most popular have been basis function regression methods ini- 
tially proposed in [13] and [21] . If these methods are used to provide an approximate 
optimal exercise strategy, they naturally provide lower bounds for prices. Thus they were 
soon followed by dual methods \W\ [T2] designed to find upper bounds. An account of 

^ these methods can be found in [TT] . 

Following on from the development of dual methods for American options, there has 

£NJ been a strand of research extending these ideas to multiple optimal stopping problems, 

which correspond to options with multiple exercise features |15j . The dual method pro- 
ceeds via the idea of pathwise optimization, which originated in [10J. This pathwise 
optimization method was developed in a general setting in [17] where it was applied to 

• ^h more general stochastic control problems. 

In this paper, our aim is to consider a subclass of such stochastic control problems 
for which we can develop a relatively simple dual approach and which leads to numerical 
algorithms for the efficient computation of the value function. 

We were originally motivated by option pricing problems in the electricity market. 
In that setting contracts such as swing options give the holder certain rights to exercise 
variable amounts through the lifetime of the contract. The dual approach, initiated in 
[15] , used a simplistic swing contract in which a single exercise was allowed on each day, 
with the total number of exercise rights over the lifetime of the contract constrained. 
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In [H [5] , this was extended to multiple discrete exercises on a given day. Other recent 
developments have seen a move to continuous time [6] and a 'pure martingale' dual 
formulation of the problem |19j . 

Our first aim in this paper is to provide a more general formulation of the dual 
problem in discrete time which allows exercise of continuous amounts and contains the 
'pure martingale' approach. 

Our second aim is to provide a useful numerical approach to this type of problem. 
Having moved beyond the multiple optimal stopping problem to a more general stochastic 
control formulation, the space of controls is now potentially of dimension greater than 
one, and consequently more difficult to handle. Instead of a purely binary decision (or at 
most a finite set of decisions) at each time point, we have the possibility of choosing from a 
Euclidean space (in the electricity context, this is exercising a real amount corresponding 
to a volume of power) . Our dual formulation of the problem leads naturally to an upper 
bound on the value function. We develop a technique based on being given an a priori 
estimate for the value function, say typically an estimate obtained via basis function 
regression, and converting this to an improved estimate via the dual. 

In order to produce the a priori estimate, the method uses least squares regression and 
Monte Carlo techniques, an extension of the approach due to Longstaff-Schwarz [13] and 
Tsitsiklis and van Roy [21]; we use test functions that depend on both the underlying 
factor and the control value. We note that this idea has been considered by Boogert 
and Jong [JJ; however, Boogert and Jong did not develop the extended regression based 
method in detail, but worked with regression depending only on the underlying factor for 
several discrete values of the control. Belomestny et al. [I] have also developed a family 
of least squares regression and Monte-Carlo based numerical algorithms. The algorithm 
in [1] can be applied to more general discrete time control problems than the ones we 
consider in this paper. However, as in [7J, Belomestny et al. regress the conditional 
expectation arising in the dynamic programming principle using test functions depending 
on the underlying factor only. When applied to the same control problem, with the right 
choice of test functions and grid in the space of underlying factor and control, we found 
that our extended regression based method performs better than the method in [7J or the 
method in [3], especially when the control is high dimensional. 

The a priori estimate is used as an input to the dual formulation based upper bound. 
The implementation of the dual estimate requires the numerical solution of several in- 
dependent deterministic optimal control problems. We note that these control problems 
can be solved simultaneously, and, hence, it is well suited for a parallel implementation. 

As an application, we will focus on one example in this paper, namely natural gas 
storage valuation. The owner of a natural gas storage facility is faced with an optimal 
control problem in order to maximize the return from running the facility. The demand 
for natural gas is seasonal with high demand and prices in the winter, and low demand 
in the summer. The operator of a facility will want to buy and store gas when it is 
cheaper over the summer, and then sell gas into the market when the price is higher in 
the winter. The operation of the facility is thus a control problem where, on a given day, 
the operator has the decision to inject or produce a volume of gas, given the current price 
of gas. Thus, we have the set up of a stochastic control problem of the type we consider 
here. We chose the particular gas storage problem as a numerical example in order to 
compare the results of our probabilistic approach to the results of the partial differential 
equation based methods (cf. [U [20]). In general, we expect the probabilistic approach 
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to perform better than the PDE methods when the dimension of the underlying factor 
and/or the dimension of the control is high. 

Our numerical example demonstrates that the dual formulation based upper bound 
is sharper than the one we get from the a priori estimate at comparable computational 
expense. This empirical observation justifies the potential benefit of computing the dual 
formulation based estimate in practice. 

The outline of the paper is as follows. We will begin with the setup for the problem 
in Section 2 and follow this with the dual formulation in Section 3. We obtain our main 
representation in Theorem 3.1 and then derive a version which can be used for the Monte 
Carlo based numerical technique in Lemma |3.3[ We follow this with a discussion of the 
numerical technique itself in Section 4. Finally, we apply the approach to the gas storage 
problem in the last section. 



2 Discrete time decoupled stochastic control problems 

We consider an economy in discrete time defined up to a finite time horizon T. We 
assume a financial market described by the filtered probability space (^,-T 7 , (Ftjt&T^)-, 
where T = {0,1,..., T}. We take (XtjteT to be an M^-valued discrete time Markov 
chain representing the price of the underlying assets and any other variables that affect 
the dynamics of the underlyings. We assume that the filtration (J-t)teT is generated by X. 
Moreover, we assume that P is a risk neutral pricing measure, and write Ej(X) = E(X|J r t) 
for any random variable X on our probability space. Throughout the paper, we will 
assume that interest rates are 0. 

We phrase our problem in the language of options, even though it is a standard 
stochastic control problem of maximizing a reward obtained from a randomly evolving 
system. The payoff of the option (or the reward for the position) Ht(h, Xt) at time t £ T 
is a function of the underlying [Xt)teT an d the exercise amount h € K , which is chosen 
by the holder of the option subject to certain constraints. The problems that we consider 
are decoupled in that the decision at time t regarding ht has no impact on the evolution 
of the underlying state of the economy X s for s > t. 

The set of admissible exercise decisions available at a given time is defined by a 
(set-valued) function K on T x M. 1 x M. d that takes values in the set of subsets of M. 1 . 
The control process (yt)teT C M. 1 is defined by t > 0, yt+i := yt — h t , for the exercise 
amount h t . We will write K t (y t , X t (u)), or, if needed, K(t,yt, X t (uj)), for the given set 
of admissible exercise decisions depending on t, the state of the underlyings Xt, and the 
value of the control process yt- The initial value yo and the constraints K t (-,-) for t £ T 
are determined by the option contract. 

Definition 2.1 (i^-admissible exercise policy). A policy, or exercise strategy, ir = (ht, . . . , /it) 
is a K -admissible exercise policy on {t , . . . , T} started at y if it satisfies all of the following 
properties. 

(i) h s is T s -measurable for s = t, . . . ,T , 

(ii) and h s (uj) £ K s (y s ,X s (uj)) for all s = t, . . . ,T and for all uj in a set of probability 
one, 

where y s is defined recursively by yt = y and y s+ i = y s — h s for s = t, . . . ,T — 1. 
The set of such policies is denoted by Vx,y,t- 
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Thus, a X- admissible exercise policy tt on the time-set {t, . . . , T} is defined by the 
^-adapted process (h s ) s =t,...,T describing the exercise decisions at times t, . . . ,T, and the 
value of such an exercise policy tt at time t is given by 



V t K >*(y t ,x)=E 



Y,H s {h S} X s )\X t 



s=t 



(2.1) 



In the particular examples considered in this paper, the set Kt will be a line segment in 
R or a quadrant of M 2 . 

We are now in a position to define the value function V t K '*(yt, X t ) at time t of the 
option satisfying the constraints K. 

Definition 2.2. We define the value function to be 



V t K '*(y,x) = sup V t K,7T (y, x) = sup E 



For simplicity, we make the following assumption. 



T 

YH s (h s ,X s )\X t 



s=t 



, (y,x) GM 1 x 



Assumption 2.3. There exists a set of initial control values and a bound C such 
that 

E[\H s (h,X s )\]<C VseT,heK s ( y ,X s ), 
for all (y,X s ) reachable at time s from Yq x {Xo(uj)\uj £ J!) by a K-admissible policy. 

This is enough to ensure the existence of the value function and the dynamic pro- 
gramming principle. Weaker assumptions which guarantee existence would be possible, 
but are not the focus of this paper. 

In order to indicate the type of problems that fit into this framework, we give three 
examples. In the final section, we will focus on the first. 

Gas storage valuation: 

Natural gas storage valuation and optimal operation can be formulated as an option 
contract as described above. In particular, let Xt £ K denote the spot price of natural 
gas at time t, and let yt G M denote the amount of gas stored in the facility at time t. In 
that case, with ht denoting the change of level between t and t + 1, the payoff is defined 
by 

H t (ht,X t ) = h t X t . 

More accurate models may also take into account the loss of gas occurring at injection. 

At time t, the set Kt(yt,Xt) is determined by the maximum and minimum capacity 
of the gas storage facility, and by the injection/production rate depending on the stored 
amount Xf. A continuous time description of this problem was given in [Tl] and in |2Uj . 
In section 5.1, we present a time-discretized version. 

Swing option pricing: 

In the electricity market, a swing option enables the holder to protect themselves against 
the risk of price spikes if they are exposed to the spot price of electricity X. The simplest 
versions give their holder the right, for a specified period of time, to purchase each day 
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(on- or off- peak time) electricity at a fixed price K (strike price). In this case, the payoff 
is that of a call {Xt — K) + . When exercising a swing option at a time t, the amount 
purchased may vary (or swing) between a minimum volume, mt, and a maximum volume, 
Mt, while the total quantity purchased for the period must remain within minimum m 
and maximum M volume levels. Thus, we have 

H t (h t , X t ) = h t max{X t - K, 0), 

with mt < ht < Mt and fh < Ylt=o < M. The set Kt is the line segment determined 
by these constraints. 

Optimal liquidation: 

A similar approach can be pursued to model the optimal liquidation of a portfolio of 
dependent assets. Let X t = (X^,X 2 ) G M 2 denote the value of two bonds by the same 
issuer with different issue dates. We assume that, whenever X 1 is traded, there is a 
temporary price impact on both, X 1 and X 2 , referred to as a multi-asset price impact. 
For example, if ht = (hj, h 2 ) G M + x M + denotes the quantities sold of the bonds X 1 and 
X 2 , respectively, the payoff is 

H t (h t , X t ) = h\x\ + h 2 X 2 - {hjKhtf 

for some (3 > 1/2 and some matrix A G M 2x2 . The reader is referred to [18] for further 
details. The one-dimensional case is considered in [2]. 

In this setting, the set Kt is determined by the total volume of each bond that we 
hold and wish to liquidate, and is hence a subset of M 2 . 

Many models (see for example [3]) consider modelling the permanent price impact 
of trades on top of incorporating the temporary impact. In the presence of permanent 
impact, the optimal liquidation problem is typically formulated as a coupled stochastic 
control problem. The approach based on coupled control problems falls beyond the scope 
of this paper. For certain models of permanent price impact (e.g. non-resilient impact), 



the a priori estimate presented in Section 4.2 can be easily adapted. Deriving a dual 
formulation, however, is less straightforward. 



3 Dual formulation 



Definition 2.2 represents the value of the option as the supremum over the set of admissible 



exercise policies. We now develop a dual for this problem that represents the option value 
as an infimum over a space of martingale-valued functions. Let M denote the space of 
functions defined on M! and taking values in the space A4q of martingales which are 
adapted to the filtration (T t )teT and null at time 0. For M G M, y £ R l , t £ T, Mf 
denotes the time-t value of M(y) G A^o- 



Theorem 3.1. Let K be a function defined on T x 



of subsets of M 1 . Then, for all y G , the value V^'*(y, Xq) of the option at time 
almost surely satisfies the following. 



and taking values in the set 



inf IE 
AieM 



T-l 



sup Yl ( H t( h *' X t) ~ M t+1 + M t W ) + H T (h T , X T ) 



Xt) = X 



(3-1) 
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Moreover, 



V K '*(y,x)=E 



T-l 

sup ^(H t {ht,X t ) - i<f +1 +M t K ' yt+l ) +H T (h T ,X T ) 



X n = x 



where M K < y G Mo such that 



(3.2) 



i<f := M« > y + V t K + t{yX t+1 ) - E t \V t ^(y,X t+1 ) 
Proof. We follow a similar approach to that of Rogers [T7] . We have 

T 

X = x 



rK,*, 



'*(y, x )= su p E 



sup E 



Y, H s(h s ,X. 

s=0 
T-l 

£ {H s {h s ,X s ) - MJ4+ 1 + Mf + 1 ) + flr^, X T ) 



< E 



s=0 
T-l 



X = X 



sup £ {H s (h s ,X s ) - Mj;+ X + + 1 ) + fl- r (^r, X T ) 



Xn = X 



As this holds for all martingales M w , we then have 

T-l 



V K '*(y,x) < inf E 



A/eM 



sup Yl i H s(h s ,X s ) - Mf£ + + 1 ) + £T r (fcT, *t) 



To see that the inequality holds the other way around, we consider a particular family 
of martingales. The one that we take is {M^ ,Vt ,t G 7~\T} from the Doob decomposition 
of the value function. Thus, its increments are given by 

AM^' yt+1 = 7<f + 1 - M**»» = V t «?(y t+1 ,X t+l ) - E t [v£* (y t+1 , X t+1 ) 

Using this martingale, we have 



inf E 
A/eM 



sup ( (H s (h s ,X s ) - M^X 1 + + 1 ) + H T (h T , X T ) \ 



< E 



E 



'T-l 



sup I ^2(H a (h.,X a ) - AMf* +1 ) + H T (h T ,X T ) 



Xo = x 



Xo = x 



T-l 



sup \ Yl ( H s(hs,X s ) - V s K + '*(y s+1 ,X s+l ) + E s V s K + '*(y s+1 ,X s+1 ) 



+ Hxihr, Xf 



Xo = x 



By the definition of the value function V t K '*(-, •), for any (y, x) G M' x M. d , i e T, and 



h G K t (y,x), we have 



V^'*(y,x) > T/i(/i,x) + sup E 



J] ff s (^,X s )|X* = z 



.s=t+i 
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and, therefore, 



V t K '*(y,x) > H t (h,x) + E V t f*(y - h,X t+1 )\X t = x 



Hence, 



inf IE 
AieM 



'T-l 



sup \ Y,(H s (h s ,X s ) - + M^) + H T (h T , X T ) 



s=0 



o = x 



< E 



'T-l 



sup I {V s K >*(y s ,X s ) - V s K + '*(y s+1 ,X s+1 )) + H T (h T , X T ) 

*ePK,v,0 I s=0 



X = X 



V K '*(y,x)+E 



sup {H T (h T ,X T ) - Vf'*(h T ,X T )} 



Now, using the fact that at T we must have \jf'*(y,x) = Hx(y,x), we have 



inf E 

MeM 



'T-l 



sup <^ Yl ( H s(hs,X s ) - M y s $ + Mf + 1 ) + H T (hr, X T ) 

nePK,v,Q [s=0 



X = X 



as required. 



<V K '*(y,x) 



□ 



Remark 3.2. Consider the specification of the multiple stopping problem in [19J. The 
payoff function is Ht(h, x) = hx for t G T, the control satisfies < yo < T + 1 and takes 
non-negative integer values, and the constraint sets are defined by 



K t (y,x)=K t (y) 



{1} iiy>T-t + l, 
{0,1} ifT-t + l>y>0, 
{0} if y = 0. 



In this special case, the payoff value is either or Xt. Hence, (3.2) simplifies to the 
following. 



V K '*(yo,x)= inf E 
M 1 ,...,M k £Mo 



yo 

max y^(X tk - Mf°~ k + Ml 



0<tl<-<t„ n <T 



yo-k\ 
k 



k=l 



X = X 



The dual formulation in this form coincides with the result obtained in [19 



In general, we need to solve the deterministic control problem along the path in order 
to use this approach. If we have a good approximation to the value function, then we can 
use the martingale arising from its Doob decomposition, as this will be an approximation 
to the optimal martingale. 

We note that, if we are given a set of approximations to the value function, we can 
bound the error made in the upper bound arising from the dual formulation in terms of 
what are essentially the errors in the dynamic programming equations. 

More specifically, let Vt(y,x), t = 0, . . . ,T be a set of approximations to the value 
function and denote by V^(y,x) the associated upper bound on the value function. 
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Lemma 3.3. The difference between the a priori estimate for the value function and the 
associated estimate arising from the dual formulation can be expressed as 

V?(y,x)-V t (y,x) = 



E 



T-l 



sup ^2H s (h s ,X s )+E s [V s+1 {y s+1 ,X s+1 )} - V s (y s ,X s 

neT>K, y ,t s=t 



X t = x 



Proof. We choose the martingale in the upper bound to be the one generated by Vt(y, x), 
and, thus, 

M v t+l - Ml = V t+1 (y,X t+1 ) - E t [V t+1 (y,X t+1 )} . 
Substituting this into the dual formulation given in Theorem |3.1[ we have 



V?(y,x)=E 



T-l 



sup {H s (h s ,X s ) - + Mf + 1 } + H T (h T , X T ) 

K€PK,y,t s=t 



X t = x 



E 



T-l 



sup y^{H s (h s ,Xs) - V s+ i(y s+ i,X s+ i) + E S [V s+ i(y s+ i,X s+ i)}} 



+ Hx(hT, Xt) 



X t = x 



E 



T-l 



SU P ^2 { H s(h s ,X s ) - V s+1 (y s+ i, X s+1 ) + V s (y s ,X s ) 
irev K , y ,t s=t 

+ E S [V s+1 (y s+1 ,X s+1 )}-V s (y s ,X s )}+H T (h T ,X T ) 



X t = x 



E 



T-l 



sup ^2{H s (h s ,X s )+E s [V s+1 (y s+1 ,X s+ i)]-V s (y s ,X s )} 

K£V K ,y,t s=t 



+ V t (y t , X t ) - V T (y T , X T ) + H T (h T , X T ) 

T-l 



X t = x 



sup ^{H s (h s ,X s ) + E s [V s+1 (y s+ i,X s+1 )]-V s (y s ,X s )} 

"^K-S-i S=t 



= V t (y,x)+E 

as Vr(yT, Xt) = Hxihr, Xt), giving the required result. 



X t = x 
□ 



4 The numerical approach 

We now present a numerical implementation of the dual upper bound derived in Lemma 
The lemma gives a representation of the difference between an upper bound V^(y, x) 



3.3 



and another (a priori) approximation Vt(y, x) of V t K) * (y, x). In Section 4.1, we present a 



numerical method that approximates (y, x) given that approximations of the functions 
V t (y,x) and 

(x,y)^E[V t+1 (y,X t+1 )\X t = x] (4.1) 



are available. 



In Section 4.2 , we introduce an approach to generate an a priori estimate Vt (y, x) and 



an approximation of the conditional expectation (4.1) 
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4.1 Estimating the dual upper bound 

In this section, we assume that a set of a priori approximations Vt(y,x) is available, i.e., 
for t = 0, ...,T — 1, the function Vt(y,x) represents an approximation of V t K, *(y, x). 
Furthermore, we assume that, for t = 0, . . . , T — 1, the function Vt(x, y) and (an estimate 
of) 

(x,y)^M[V t+1 {y,X t+1 )\X t = x] 

can be computed for any time-t reachable pair (x,y). 

Under such assumptions, we intro duce a numer ical method that implements the upper 



estimate V ^(y,x) derived in Lemma 



3.3 



3.3 



requires the estimation of a path- 



Lemma 

wise optimum. Hence, given a trajectory x. = {xq, . . . ,xt}, we aim to approximate the 
function 



T-l 



F t (y,x.) := sup ^ {H s (h s , x s ) + E [V s+ i(y s+ i, X s+ i)\X s = x s ) - V s (y s , x s )} 

recursively for t = T, T — 1, . . . , 0. The optimization algorithm is based on the following 
path-wise dynamic programming principle. 



F T (y,x.)=0, 

F t (y,x.)= sup <! {H s (h s ,x s ) + E [V s+ i(y s+1 , X S+1 )\X S = x s ] - V s (y s ,x s )} 



>t-i 



K&VK,y 



t I S=t 



and 



sup \H t (h,x t )+E[V t+1 (y - h,X t+1 )\X t = x t ] 

h£K t (y,x t ) 1 

- Vt(y,x t ) +F t+1 (y - h,x.)}, 



Vj(y, x) = V (y, x) + E [F (y, X.)\X = x}. 



(4.2) 



(4.3) 



Based on (4.2) and ( |4.3| , we are now in a position to formulate the following algorithm. 

Algorithm 4.1. Generate N independent trajectories x t .,i = l,...,Nof the process X 
started at a fixed Xq. For i = 1, . . . , N 

1. Set t = T, and define y h-> Fx(y, x]) = 0. 

2. Sett - 1 -)■ t. 



3. Define a finite gird Q\ C Dom(Ft(-, x 1 .)) C M 1 (see Remark 4-2), and for each y <^Q\ 



solve the optimization problem 

F t (y,x i ) = 



sup {Ht{h,x\) +E [V t+1 (y - h,X t+1 )\X t = x\ 

y-heDom(F t+1 (-,x i )) 
heK t {y,xl) 



V t {y,x\) + F t+l {y-h,x 1 .)} 



4- Given the set {(y, F t (y, x l .))\y G G f}, define (interpolate) F t (-,x!) on the whole 
domain Dom(Ft(-, x 1 .)) (see Remark 4-%)- 
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5- Ift>l, continue with^ otherwise finish. 
Once y h-> Fo(y,x l ) is defined for all i = 1,...,N, we approximate V ^(y,Xo) by the 



Monte- Carlo average 

N 



v ( y ,x ) + ^J2^(y,- 



1=1 



Clearly, the main challenge in the implementation of Algorithm 4.1 is the solution of 
the optimization problem in step 3. 

Remark 4.2. The particular implementations of the above algorithm differ in 

(i) the specification of the domain Dom(Ft(-,x t )), 

(ii) the definition of Q%, 

(iii) the approximation of the solution to the optimization problem in point [3] of the 
algorithm, 



(iv) and the method applied in point [4] of the algorithm. 



Two possible versions of Algorithm |4. 1| are presented in Sections 4.1.1 and 4.1.2 

4.1.1 Implementation I: Discretization of the control 

One possible approach is to discretized the problem in the control. We define Qq as 
an (equidistant) grid contained in the set of initial control values of interest. Then, 
recursively for t = 0, . . . , T — 1, we define Gt + \ to be an (equidistant) grid contained in 
the set 

{y + h\yeGl heK t (y,x t )}. 

Furthermore, Dom(i ? ((-, x?)) is defined to be the same as Q\\ this specification implies 
that the optimization problem in step [3] of Algorithm 4.1 is an optimization over a finite 
set; moreover, F t (-, •) = F t (-, •) for t = 0, . . . ,T. 

Remark 4.3. The choice of Q\ depends on the constraints of the problem. For instance, 
in the case of the gas storage problem, there is a well defined lower and upper limit of y; 
Q\ can be an equidistant grid in this region. 

4.1.2 Implementation II: Parametric curve fitting 

Here, we define Q\ similarly to the previous version. However, we assume i*t(-, •) to be a 
parametric surface of the following form. 



F t (y,x i ) = ]T\i }r <t> r (y) 



r=l 



for some vector of parameters K\ = (\\ 1; . . . , \\ R ) depending on the trajectory x l and 
for some set of test functions (</>i, . . . , <fin) with domains in M. 1 , implying 



R 



Dom(F t (.,i!)) = f|Dorm> r ). 



r=l 



10 



The accuracy of the algorithm is sensitive to the choice of test functions; more specifically, 
different settings may have different optimal sets of test functions, and the (numerical) 
solution of the optimization problem in step [3] of Algorithm 4.1 should be adapted to the 
particular choice of test functions. 

Point [4] of Algorithm 4.1 is implemented via a least squares regression, i.e., we define 
K\ to minimize the expression 



E 



R 



F t (y,x l )-J2K,rMy) 



r=l 



Remark 4.4. As we increase the number R of independent test functions and the number 
N of simulated trajectories, we anticipate that F t (-,-) converges to F t (-,---) for t = 
0, . . . , T. However, Ft(-, •) is likely to estimate Ft(-, • • • ) from below, and, therefore, our 
method is likely to result in a low-biased estimate of the dual formulation based upper 
bound. 



4.2 An a priori estimate 



As stated at the beginning of Section 4.1, the solution of (4.2) requires computable 
functions Vt(y,x) and 



G t (y,x) :=E[V t+1 (y,X t+1 )\X t = x] 



approximating V t '* (y, x) and E 



V t +'*(y, X t+ i)\X t = x , respectively, for t = 0, . . . ,T-1. 
We suggest the following method, which is based on the dynamic programming formula- 
tion. 



Definition 4.5. (Dynamic Programming Formulation) 



V t K >*(y,x) :-- 



^VheK T ( y , x )H T (h,x) 



V t f*(y-h,X t+1 )\X t = x\] 



ift = T, 

ifO < t < T - 1. 
(4.4) 



For the computation of the conditional expectation in the above formulation, we in- 
troduce a slightly extended version of the standard least squares regression based Monte 
Carlo method [131 ED E]. Our construction yields an a priori estimate Vo(-, •) that ap- 
proximates Vq '*(•, •) for a bounded set S of initial Xq values, where S is contained in the 
support of the law of Xq. 



x 



Algorithm 4.6. Define a set Qq of distinct initial values of X in S (see Section 4-2.1) 
and generate independent trajectories x), i = 1,...,N, of the process X, with x 
for each x £ Q%. Define Qf = {x\ \ 1 < i < N} for t = 1, . . . , T, where N = 
Furthermore, for t = 1, . . . , T, define a finite set 



grQDom(V t (-,-))C 



l l x R d 



such that, for all (y,x) G Q\ , we have x G Qf (see Remark Then, proceed as 

follows. 
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1. Set t = T and define 

V T (y,x)= sup H T (h,x). 

h£K T (y,x) 

2. Set t - 1 -> t. 

3. Given the set 

{(y,x t ,x t+ i,V t+1 (y,xt + i)) | (y, x t+1 ) G G^} , 
define a function Gt(y,x) approximating Gt(y,x) on Dom(Gt(- , •)) fsee Remark 



J7§. 

4- For each (y,x) S C/f 1 , sofce £/ie optimization problem 

V t (y,x)= sup + (4.5) 

y-yeK t (y,x) 

(y,x)£Dom(Gt(-,-)) 

5. Vt(-,-) is on/y defined on Q\ x ■ Given the set 

{{y,x,V t (y,x)) | (y, x) g Gf x } , 
define the function Vt(-, •) on z£s domain (see Remark \4- 



6- If t > 1, i/ien continue with step 2; else, Vo(y,x) results in an a priori approxima- 
tion. 



The above outline of Algorithm 4.6 leaves some choice as to how certain things are 



done in detail; in particular, this includes the following points. 



Remark 4.7. The particular implementations of Algorithm 4.6 differ in 



(i) the construction of the set Qf x , 



(ii) the construction of the function Gt(-,-) m step[4j 

(iii) and the construction of the function Vt{-, •) in stepp] 



In Section 4.2.1 we implement a particular version of Algorithm 4.6 for the a priori 
estimate. 

4.2.1 Choosing an implementation 



Since the three items described in Remark 4.7 are closely connected, we discuss them 
together. 

In a similar way to the classic least squares regression based approach (cf. [T3j EH |9] ) , 
we approximate the function Gt(-,-) by an orthogonal projection onto a function space 
spanned by a set of test functions {^i, . . . , i/jq}, where, for q = 1, . . . , Q, ip q is defined on 
Dom(G t (-, •)) Q x R d , and 

Q 

G t {y,x) = Y,ltMy,x) « G t (y,x) = E [V t+1 (y, X t+1 )\X t = x] . (4.6) 

9=1 
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E 



(4.7) 



In contrast to \13\ \21\ |9], where the orthogonal projection at time t is determined by 
the distribution of Xt, we have to deal with the control variable as well. We define the 
projection to minimize 

y.z | E[V t+1 (Y,X t+1 )\X t = Z\- Y,lt,MY } Z) 

9=1 

where Z and Y are independent random variables. In most applications, the set reachable 
by yt+i is bounded, and, therefore, in order to try to obtain uniform accuracy across the 
reachable set, we will take Y to be uniformly distributed on this bounded set. The 
distribution of Z can be defined to coincide with the distribution of Xt. However, in 
many applications, such as the numerical example in Section [5] only the distribution of 
Xt conditioned on particular values of X$ is specified; here, we assume that the law of 
Xq is uniform on a certain set S. 

Formula (4.7) suggests that, by increasing the number of appropriately chosen test 
functions ipi,ip2, ■ ■ ■ , Gf(-, •) approximates the conditional expectation 

(y,x)^E[V t+1 (y,X t+1 )\X t = x] 

in the mean square sense with respect to the joint measure of Y and Z; for our particular 



choice of test functions, see Section 5.2 



In order to determine the regression coefficients jt,q for q = 1, . . . , Q, we observe that, 



when (|4.7|) is minimized, we have 




-Ey> 



Q 



E[V t+1 (Y,X t+1 )\X t = Z\- Y,lt,MY,Z) 

9=1 



2E YjZ [E[V t+l (Y,X t+1 )\X t = Z]MY,Z)] - 2E Y<Z 



Q 



Y,lt,MY,Z)^ r {Y,Z) 



9=1 



Q 



(4i 



= 2E Y , z [V t+ i(Y,Xt +1 )MY,Z)] -2z~2lt, q ^Y,z IMY,Z)MY,Z)} = 0- 

9=1 

Hence, jt = (lt,i, ■ ■ ■ i7t,Q) T satisfies the linear equation 
where 

By^ = E Y ,z [V t +i(Y, X t+ i)ip{Y, Z)} 

and 

B i , = E Y , z ty(Y,Z)4,(Y,Z) T ] 

for i/](x,y) = (i) 1 (x,y),...^ Q (x,y)) T . 

When estimating the regression coefficients, we replace By^ and in (4.8) with 
their Monte-Carlo estimates 



,xt + i)ip(y,x t ) 



and B^, :— jgwj 



E ^(y,xt)4>(y,x t ) T . 



(4.9) 
(4.10) 



13 



The choice of Qq and Qf x , t = 0, ...,T, determines how accurately By^ and 
approximate By^ and B^, respectively. When implementing the method, we consider 

i) Qfi to be randomly sampled from the law of Xq, or Qq to be a low discrepancy 
sequence in S, 

ii) xt to be randomly sampled from the conditional distribution X^Xq, 

iii) and y to be independent of xt and randomly sampled from the uniform distribution 
on the support of yt, or to be a low discrepancy sequence^] in the support of yt\ we 
generated a small number (1 to 10) of y items for each x. 

Remark 4.8. Initially, we looked at defining Qf = Q\ x Qf, for some set Q\. However, 
the numerical results showed that to achieve a given accuracy, a large enough Q\ is 
required, resulting in a set Q\ x Qf significantly larger than the size of Q\ x constructed 
in the version described prior to this remark (calibrated to yield the same accuracy). 

Figure [^demonstrates the difference between Q\ x Qf and the set Q\ x described before 
this remark. We observe that Q\ x yields a better coverage with fewer grid points. 




X 



(a) Q\ x grid, y points generated rank-1 lattice rule (b) Q\ x — Q\ xQf (12500 points in total, 25 y-items 
(4000 points in total, 8 y-items per each x item) per each x item) 



Figure 1: Different constructions of Q\ based on the same Qf , assuming equidistant 
Qq C [—0.5,0.5] grid and Gaussian conditional distribution (Xt\Xo). 



What remains is to be specified are the particulars of step [5] of Algorithm 4.6, i.e., to 
define Vt(-, •) given 

{(y,x,V t (y,x)) | (y,x)e(% x }. 

To do this, one can use interpolation, or one can fit a parametric surface to the graph of 
Vt(-, -)j we consider the parametric representation 

Q 

Vt(y,x) = ^p t ,q^q{y,x), 

9=1 



4 We tested rank-1 lattices, see Section [H] 
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choosing q , q = 1, . . . , Q, to minimize the mean square error 



/_ Q 

^2 \ v t(y, x ) -^Pt,qtp q (y,x) 

i.e., we define Vt(-, •) by another least squares regression. 

Remark 4.9. For the numerical computation of the dual formulation based approach, 
to ensure that we have an upper bound, it is essential that the estimate of 

V t (y, x) - E[V t+1 (y, X t+1 )\X t = x] (4.11) 

is a martingale increment, that is it has zero expectation. 
As introduced in this section, the function 

V t (y,x)-G t (y,x) (4.12) 



is a biased estimate of (4.11). When (4.12) is used for computing the dual upper bound - 
in particular, when Gt(-, •) is a poor estimate of Gt(-, •) - the error due to this bias might 
be larger than the statistical error. 

In such cases, one can replace Gt (•, •) with the following estimate 

L 

G t { v ,x) = iY. v ^y^ x tii) 

i=i 

where {A^, i = 1, . . . , L} for some L is an i.i.d. sample from the conditional distribution 
of (X t+ i\x). 

4.2.2 Other choices in the implementation 



Multivariate regression similar to (4.6) has been mentioned in [7J. However, [7J does not 
pursue the same route as presented above, but rather restricts attention to a least squares 
regression that uses test functions depending only on the underlying factor X, and, for 
each value of y in a finite set Q y C i', a separate simpler regression is computed. The 
extension of Vf(-, •) to the whole domain of Vt(-, •) is not considered (step [5] in Algorithm 
4.6). [7J restricts the optimization problem ( |4.5[ ) in step [5] to Q y . 

Computing regressions which are based on test functions depending only on X is 
less expensive than a regression with high number of (y, AT)-dependent test functions 



(see Section 5.2 for the implementation of the a priori method). However, for accurate 
estimates, a fine grid Q y is required, and, hence, a high number of simple regressions needs 
to be computed. With carefully chosen C/^-grid (see Remark 4.8) and a suitable set of 



(y, X)-dependent test functions, our version attains the same accuracy at significantly 
lower cost. 

4.2.3 A note on low biased methods 

The a priori estimates Vt(y, x) and E[T4+i(y, A 4+ i)| X t = x], described above, typically 
result in a high biased estimate of the value function. However, the outcome of the a 
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priori method can be applied to generate a low biased estimate. By definition, for any 
yo G Y K and for any policy tt G VK,y ,o, V '^(yo, •) is a low biased estimate of V Q K '*(yo, •). 

The a priori estimate generates a policy as follows. For t G {0, . . . , T — 1}, a reachable 
pair (y,x), and e > 0, there exists at least one value ht G K t (y,x) that satisfies 

H t (ht,x) +E[V t+1 {y - ht,X t+ i)\X t = x] 

sup {H t (h,x) + E[V t+1 (y-h,X t+1 )\X t = x]}-e. (4.13) 

h£K t (y,x) 

Given a starting value yo G ^o^i an d assuming that the supremum exists almost surely 
for reachable pairs, ( |4. 13 ) determines a exercise policy tt = (ho,---,hr) G VK,y ,o- As 
e approaches 0, V K,V (-,-) converges to V^'* (■,■). This convergence result motivates the 
following low-biased algorithm. 

Algorithm 4.10. Fix yo G Y and e > 0. Generate N independent trajectories x], 
i = 1, . . . , N , of the process X started at a fixed Xq. For i = 1, . . . , N , 

1. set t = 0, and V i = 0, 



2. for x = x\ and y = yt, find a ht that satisfies (4.13), 

3. set Vt + Htihtixi) V\ 

4- and, if t = T, then stop; else, set t + l—it, and continue with step\^ 
Once this routine has been executed for all i = 1, . . . , N, the Monte-Carlo average 

N 



V l (y ,X ) :=^J2 V 



i=l 



approximates (up to statistical error due to sampling variance) a low biased estimate at 
time for initial control value yo and initial factor value x. 



5 Numerical results 

In this section, we discuss the gas storage example following [20J, and we compare the 
numerical performance of the implementation of both, the a priori method and the method 
based on the dual formulation. 

The gas storage problem as well as related probabilistic numerical methods have also 
been discussed in [H] and in [7]. 

5.1 The gas storage problem 

The natural gas storage problem addresses the optimal utilization of certain types of 
storage facilities. We assume relatively high deliverability and high injection rates. In 
particular, given the price Xt of gas and the amount yt of working gas in the inventory 
at time t, we aim to optimize the production (injection) amount for the given day, for 
each day over a year. 

We introduce the following notation. 
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300 ~| 




Gas price ($/MMBtus) Storage level ( lOOMMcf) 



Figure 2: A priori estimate of the value of option at time 0. 




Gas price ($/MMBtus) Storage level (lOOMMcf) 



Figure 3: A priori estimate of the optimal rate of production at time 0. 
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• c, the rate of production if c > 0, or the rate of injection if c < 0. The rate is 
measured in million cubic feet per day (MMcf/day). 

• Ubasei base gas requirement (built into the facility and cannot be removed). 

• Umax, the maximum storage capacity of the facility on top of the base gas level. 

• c max (y), the maximum production rate at storage level y. 

• c m i n (y), the maximum injection rate at storage level y. 

• a(y,c), the rate of gas that is lost given production at rate c > or injection at 
rate c < 0. 

• r, the discount rate. 

As in [2D], W6 consider a facility with working gas capacity of ymax — 

2000MMcf and 

with base gas requirement ybase = 500MMcf. The maximum production rate (attainable 
at maximum capacity) is known to be c max (y max ) = 250MMcf/day, whereas the maximum 
injection rate (attainable at minimum capacity) is c m ; n (0) = — 80MMcf/day. The facility 
is available for one year, and a decision on gas production/injection is made daily, i.e., 
T= {0,1,..., 365}. 

We assume that the loss rate satisfies 

f if c > 0, 
a(y ' c) = a(c) = \ 1.7 ifc<0. 

In the discrete-time formulatioij^J we approximate the daily delivered/injected amount 

by 

ht = yt- yt+i ~ cAt, (5.1) 

i.e., the unit of time is assumed to be a day (including weekend days), which means 
At = 1. 

The daily constraints on gas production and injection are derived from the ideal 
gas law and Bernuolli's law (the reader is referred to Section 3 in [20] for details^]). In 
particular, 

Cma X (y) = C 0y /y, (5.2) 
where C = c max (y max )/^/y max . Moreover, 



AU)--C\\I- . +C 2 , (5.3) 

V + 2/base 



where C 2 = -l/(y m ax + 2/base) and 



Ct = c min (0)/ A /— + C 2 . 

2/base 



5 In [2D], the continuous time production/injection is described by an ordinary differential equation. 
The discrete-time formulation is an approximation of the solution to that ODE. 

6 Note that, in this paper, the time unit is daily, whereas in [2D] the time is measured in years. 
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Combining (5.2) and ( |5.3[ ) with (5.1), we get the constraint set for the amount of gas 
that can be produced/injected during a day: 

K t (y,x) = K(y) = [- min{c min (y)At, y max - y},imn{c max (y)AT,y}]. (5.4) 

The payoff function is defined by H t (-,-) = for t = T, and 

y . f e- H h t X t if/l>0, 

** ( *" * t} " I e^fa - a(^)At)^ if h t < 0, (5 ' 5) 

for t = 0, . . . ,T — 1, incorporating the value of the loss of gas at injection. 
The discount rate is assumed to be 10%. 

In practice, gas prices are quoted in "dollars per million British thermal units" 
($/MMBtus). We note that 1000 MMBtus are roughly equivalent to 1 MMcf. 
The calculations in [20] are based on the gas price model 

dX t = a((3 - X t )dt + jX t dB t + (J t - X t )dq t , (5.6) 

where t \— ¥ qt is a Poisson process with intensity rate A and independent of the Brownian 
motion Bt . Moreover, Jt is normally distributed with mean \i and variance a 2 independent 
of Bt and qt- In our implementation, we rescaled the parameters of [20] to daily time-scale: 
a = 0.25/365, = 2.5, 7 = 0.2/^/365, A = 2/365, /x = 64, and a 2 = 4. 

Remark 5.1. Since the payoff function is piece- wise linear in h and the constraints sets 
are bounded (uniformly in t) for any iT-admissible policy ir, the following bounds are 
satisfied for all t G T, x € M + , y G [0,y max ]. 

T 

- 00 < (-c min (0) - a(-l))At^2E[X s \X t = x] 

s=t 

T 

< V t w (y,x) < c max (y max )At^E[X s |X t = x] < 00. 

s=t 

These inequalities imply that the value function is well defined, and the dynamic pro- 
gramming principle holds for this particular formulation of the gas storage problem. 



5.2 The a priori estimate 

We computed the a priori estimate as follows. 

First, we ran the method using an equidistant initial grid Q$ in the price region 
[0, 12] of interest ([20J presents results in this price interval). However, we found that the 
absolute value of the second derivative of Vb(-, ) with respect to gas price was large in the 
price interval [5,7], and close to zero otherwise; therefore, we decided to refine the grid 
in the middle region. In particular, we chose an initial grid Qq that had 2500 equidistant 
points in the interval [0,5], 5000 equidistant points in [5,7], and 2500 equidistant points 
on [7,12]. 

The gas price trajectories x] for i = 1, . . . , 10000 were simulated using the Euler 
time-discretisation 

x\ +l =x\ + a(f3 - 4)At + jxlABl + ( J\ - x\)Aq\, 
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Range of 




X 


vS(yo,X )-Vi(y ,X ) 


stdev(F t (yo,X )) 


stdev(V^(y ,X )) 


3 


[1.224,3.781] 


[0.115,0.121] 


[0.188,0.208] 


6 


[1.758,3.677] 


[0.116,0.128] 


[0.121,0.133] 


9 


[2.174,4.276] 


[0.060,0.076] 


[0.115,0.118] 



Table 1: Comparison of high-biased and low-biased estimates: ranges of differences 
and ranges of estimated standard deviation over the domain yo G [0, 20] measured in 
$/MMBtus. 




Figure 4: Comparison of three methods, Xq = 3$/MMBtus. 




Figure 5: Comparison of three methods, Xq = 6$/MMBtus. 
where ABl are independent Brownian increments on a unit time step (At = 1), J\ are 
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drawn from the distribution of Jt, and Aql drawn from the distribution 



with probability 1 — AAi, 

1 with probability A At. 



In order to generate the grid Gt, at each time step, we generated a low discrepancy 
sequence (using a rank 1 lattice rule with random offset, see [TT]) of length \Qf\ x N y , 
and assigned N y y-points to each of the elements in Qf. We tested the method with 
N y = 3,6,21. 

Initially, we considered using polynomial test functions for the regression. However, we 
found that these test functions did not capture well neither the conditional expectation 
function nor the value function. Therefore, we decided to use test functions that are 
polynomial on patches and constant outside the patches. We partitioned the (y, x) domain 
[0, 20] x [0, 12] into smaller rectangles 

[0, 10] x [0, 5] [10, 20] x [0, 5] 
[0, 10] x [5, 7] [10, 20] x [5, 7] 
[0, 10] x [7, 14] [10,20] x [7,14]. 

On each rectangle, we used the following polynomials: 1, x, y, x 2 , y 2 , xy, x 2 y, y 2 x, 
and x 2 y 2 . In addition to these polynomials, on the patches in the second row, we also 
used x 3 , x 3 y, and x 3 y 2 . Although defining functions locally on small rectangles leads 
to a relatively high number of test functions, the matrix is sparse, and the evaluation is 
tractable. 

In step[4]of Algorithm 4.6 we simply compared the outcome of three scenarios: h = 0, 



h = mmK t (y,x), and h = maxK t (y,x); i.e., we assumed bang-bang controls. We also 
tested replacing the supremum with the maximum over finer grids in Kt(y,x); however, 
these tests did not result in significantly different option values. 

The numerical results corresponding to t = 0, JVy = 6, and bang-bang controls are 
shown in Figures [2] and [3j Comparing these figures to the plots on page 235 in [20] , we 
find that our a priori method slightly overestimates the option value. Given that, in order 
to estimate the values at time 0, the a priori method uses information from later times, it 
is likely to be a high biased method (see comments on the least squares regression based 
methods in |11|). 

5.3 The dual upper bound 

We implemented the version of the method based on the dual formulation as speci- 



fied in Section 4.1.2 for three different initial gas prices (3$/MMBtus, 6$/MMBtus and 
9$/MMBtus). In each case, we generated N = 10000 gas price trajectories. For Q\, we 
used a fixed equidistant grid in [0, 20] with N y = 320 points. 

For the parametric curve fitting component, we partitioned the control interval [0, 20] 
into three shorter intervals ([0, 7], [7, 14], and [14, 20]), and on each small interval we used 
the following polynomials as test functions: 1, y, y 2 , and y 3 . 

In order to compute the optimization in step [3] of Algorithm 4.1 we approximated 



the supremum with the maximum on a finite grid in Kt(y,x). This grid can be chosen 
to be finer than Q\. With other optimization techniques, even more accurate estimates 
can be computed. 
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In order to estimate the accuracy of the method, we ran the algorithm using finer Q\ 
grids but the same set of gas price trajectories, more test functions defined locally on finer 
partitions, and more accurate optimization. Since the refined specifications resulted in 
absolute differences that were around 10% — 15% of the standard deviation of the results, 
we consider the refined estimates numerically equivalent to our reference results. 

We also computed low biased estimates following the method described in Section 



4.2.3 using the a priori value functions and a sample of 50000 gas price trajectories. 



The results are given in Table [T] and plotted in Figures [4j [5j and |6j For each case 
(Xq = 3, 6, 9), Table [T] describes the range of differences of the high-biased and low-biased 
estimates over the range of control y £ [0,20]. We also provide the range of estimated 
standard deviations to indicate the order of magnitude of the statistical error. We note 
that a conservative upper and lower bound can be computed by adding three times 
its standard deviation to the upper estimate and subtracting three times its standard 
deviation from the low-biased estimate. 

Figures |4j [5j and [6] suggest that, in some cases, the dual formulation method based 
estimate results in a sharper upper bound compared to the estimates of the a priori 
method. The upper and lower estimates are consistent with the numerical results of 




Figure 6: Comparison of three methods, Xq = 9$/MMBtus. 
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